skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Oh, S"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 7, 2026
  2. The assumption across nearly all language model (LM) tokenization schemes is that tokens should be subwords, i.e., contained within word boundaries. While providing a seemingly reasonable inductive bias, is this common practice limiting the potential of modern LMs? Whitespace is not a reliable delimiter of meaning, as evidenced by multi-word expressions (e.g., "by the way"), crosslingual variation in the number of words needed to express a concept (e.g., "spacesuit helmet" in German is "raumanzughelm"), and languages that do not use whitespace at all (e.g., Chinese). To explore the potential of tokenization beyond subwords, we introduce a "superword" tokenizer, SuperBPE, which incorporates a simple pretokenization curriculum into the byte-pair encoding (BPE) algorithm to first learn subwords, then superwords that bridge whitespace. This brings dramatic improvements in encoding efficiency: when fixing the vocabulary size to 200k, SuperBPE encodes a fixed piece of text with up to 33% fewer tokens than BPE on average. In experiments, we pretrain 8B transformer LMs from scratch while fixing the model size, vocabulary size, and train compute, varying *only* the algorithm for learning the vocabulary. Our model trained with SuperBPE achieves an average +4.0% absolute improvement over the BPE baseline across 30 downstream tasks (including +8.2% on MMLU), while simultaneously requiring 27% less compute at inference time. In analysis, we find that SuperBPE results in segmentations of text that are more uniform in per-token difficulty. Qualitatively, this may be because SuperBPE tokens often capture common multi-word expressions that function semantically as a single unit. SuperBPE is a straightforward, local modification to tokenization that improves both encoding efficiency and downstream performance, yielding better language models overall. 
    more » « less
    Free, publicly-accessible full text available April 14, 2026
  3. Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a 1.5× improvement over previous training-free ODE methods. 
    more » « less
    Free, publicly-accessible full text available February 24, 2026
  4. Abstract While it is well known that cosmic rays (CRs) can gain energy from turbulence via second-order Fermi acceleration, how this energy transfer affects the turbulent cascade remains largely unexplored. Here, we show that damping and steepening of the compressive turbulent power spectrum are expected once the damping time t damp ρ v 2 / E ̇ CR E CR 1 becomes comparable to the turbulent cascade time. Magnetohydrodynamic simulations of stirred compressive turbulence in a gas-CR fluid with diffusive CR transport show clear imprints of CR-induced damping, saturating at E ̇ CR ϵ ˜ , where ϵ ˜ is the turbulent energy input rate. In that case, almost all of the energy in large-scale motions is absorbed by CRs and does not cascade down to grid scale. Through a Hodge–Helmholtz decomposition, we confirm that purely compressive forcing can generate significant solenoidal motions, and we find preferential CR damping of the compressive component in simulations with diffusion and streaming, rendering small-scale turbulence largely solenoidal, with implications for thermal instability and proposed resonant scattering ofE≳ 300 GeV CRs by fast modes. When CR transport is streaming dominated, CRs also damp large-scale motions, with kinetic energy reduced by up to 1 order of magnitude in realisticECR∼Egscenarios, but turbulence (with a reduced amplitude) still cascades down to small scales with the same power spectrum. Such large-scale damping implies that turbulent velocities obtained from the observed velocity dispersion may significantly underestimate turbulent forcing rates, i.e., ϵ ˜ ρ v 3 / L
    more » « less
  5. Spurred by rich, multiwavelength observations and enabled by new simulations, ranging from cosmological to subparsec scales, the past decade has seen major theoretical progress in our understanding of the circumgalactic medium (CGM). We review key physical processes in the CGM. Our conclusions include the following: ▪ The properties of the CGM depend on a competition between gravity-driven infall and gas cooling. When cooling is slow relative to free fall, the gas is hot (roughly virial temperature), whereas the gas is cold ( T ∼ 104K) when cooling is rapid. ▪ Gas inflows and outflows play crucial roles, as does the cosmological environment. Large-scale structure collimates cold streams and provides angular momentum. Satellite galaxies contribute to the CGM through winds and gas stripping. ▪ In multiphase gas, the hot and cold phases continuously exchange mass, energy, and momentum. The interaction between turbulent mixing and radiative cooling is critical. A broad spectrum of cold gas structures, going down to subparsec scales, arises from fragmentation, coagulation, and condensation onto gas clouds. ▪ Magnetic fields, thermal conduction, and cosmic rays can substantially modify how the cold and hot phases interact, although microphysical uncertainties are presently large. Key open questions for future work include the mutual interplay between small-scale structure and large-scale dynamics, and how the CGM affects the evolution of galaxies. 
    more » « less
  6. Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there still remains room for design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel 𝐂ur𝐑𝐈culum based 𝐒equential neural decoder for 𝐏olar codes (CRISP). We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32,16) and Polar(64,22) codes. The choice of the proposed curriculum is critical in achieving the accuracy gains of CRISP, as we show by comparing against other curricula. More notably, CRISP can be readily extended to Polarization-Adjusted-Convolutional (PAC) codes, where existing SC decoders are significantly less reliable. To the best of our knowledge, CRISP constructs the first data-driven decoder for PAC codes and attains near-optimal performance on the PAC(32,16) code. 
    more » « less
  7. ABSTRACT There is considerable evidence for widespread subsonic turbulence in galaxy clusters, most notably from Hitomi. Turbulence is often invoked to offset radiative losses in cluster cores, both by direct dissipation and by enabling turbulent heat diffusion. However, in a stratified medium, buoyancy forces oppose radial motions, making turbulence anisotropic. This can be quantified via the Froude number Fr, which decreases inward in clusters as stratification increases. We exploit analogies with MHD turbulence to show that wave–turbulence interactions increase cascade times and reduce dissipation rates ϵ ∝ Fr. Equivalently, for a given energy injection/dissipation rate ϵ, turbulent velocities u must be higher compared to Kolmogorov scalings. High-resolution hydrodynamic simulations show excellent agreement with the ϵ ∝ Fr scaling, which sets in for Fr ≲ 0.1. We also compare previously predicted scalings for the turbulent diffusion coefficient D ∝ Fr2 and find excellent agreement, for Fr ≲ 1. However, we find a different normalization, corresponding to stronger diffusive suppression by more than an order of magnitude. Our results imply that turbulent diffusion is more heavily suppressed by stratification, over a much wider radial range, than turbulent dissipation. Thus, the latter potentially dominates. Furthermore, this shift implies significantly higher turbulent velocities required to offset cooling, compared to previous models. These results are potentially relevant to turbulent metal diffusion in the galaxy groups and clusters (which is likewise suppressed), and to planetary atmospheres. 
    more » « less
  8. ABSTRACT Understanding the survival, growth, and dynamics of cold gas is fundamental to galaxy formation. While there has been a plethora of work on ‘wind tunnel’ simulations that study such cold gas in winds, the infall of this gas under gravity is at least equally important, and fundamentally different since cold gas can never entrain. Instead, velocity shear increases and remains unrelenting. If these clouds are growing, they can experience a drag force due to the accretion of low-momentum gas, which dominates over ram pressure drag. This leads to subvirial terminal velocities, in line with observations. We develop simple analytic theory and predictions based on turbulent radiative mixing layers. We test these scalings in 3D hydrodynamic simulations, both for an artificial constant background and a more realistic stratified background. We find that the survival criterion for infalling gas is more stringent than in a wind, requiring that clouds grow faster than they are destroyed ($$t_{\rm grow} \lt 4\, t_{\rm cc}$$). This can be translated to a critical pressure, which for Milky Way-like conditions is $$P \sim 3000 \, {k}_\mathrm{ B} \, {\rm K}\, {\rm cm}^{-3}$$. Cold gas that forms via linear thermal instability (tcool/tff < 1) in planar geometry meets the survival threshold. In stratified environments, larger clouds need only survive infall until cooling becomes effective. We discuss applications to high-velocity clouds and filaments in galaxy clusters. 
    more » « less
  9. Abstract Subsonic, compressive turbulence transfers energy to cosmic rays (CRs), a process known as nonresonant reacceleration. It is often invoked to explain the observed ratios of primary to secondary CRs at ∼GeV energies, assuming wholly diffusive CR transport. However, such estimates ignore the impact of CR self-confinement and streaming. We study these issues in stirring box magnetohydrodynamic (MHD) simulations using Athena++, with field-aligned diffusive and streaming CR transport. For diffusion only, we find CR reacceleration rates in good agreement with analytic predictions. When streaming is included, reacceleration rates depend on plasmaβ. Due to streaming-modified phase shifts between CR and gas variables, they are slower than canonical reacceleration rates in low-βenvironments like the interstellar medium but remain unchanged in high-βenvironments like the intracluster medium. We also quantify the streaming energy-loss rate in our simulations. For sub-Alfvénic turbulence, it is resolution dependent (hence unconverged in large-scale simulations) and heavily suppressed compared to the isotropic loss ratevA· ∇PCR/PCR∼vA/L0, due to misalignment between the mean field and isotropic CR gradients. Unlike acceleration efficiencies, CR losses are almost independent of magnetic field strength overβ∼ 1–100 and are, therefore, not the primary factor behind lower acceleration rates when streaming is included. While this paper is primarily concerned with how turbulence affects CRs, in a follow-up paper we consider how CRs affect turbulence by diverting energy from the MHD cascade, altering the pathway to gas heating and steepening the turbulent spectrum. 
    more » « less